Goto

Collaborating Authors

 optimizing neural network


Optimizing Neural Networks via Koopman Operator Theory

Neural Information Processing Systems

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g.


Optimizing Neural Networks via Koopman Operator Theory (Supplemental Material)

Neural Information Processing Systems

As discussed in Sec. 3 of the main text, the computational complexity of Koopman training is We assume that both standard training and Koopman training use simple matrix computation methods. We note that none of these factors are relevant for Koopman training. The finite section method, Eq. 4, implies the run time complexity would be The authors contributed equally 34th Conference on Neural Information Processing Systems (NeurIPS 2020), V ancouver, Canada. Koopman operator(s) and evolve each partition separately from the others. In Sec. 3, we discussed when we think this "patching" approach should give small errors.


Review for NeurIPS paper: Optimizing Neural Networks via Koopman Operator Theory

Neural Information Processing Systems

Additional Feedback: As noted above, one of the biggest drawbacks of this very interesting work at present is the very limited scope of the demonstrations. I believe this should be easy to address, and were this done I would feel comfortable increasing my score. It would also be useful to see more detailed empirical study regarding the choice of the window (t1-t2) used to collect the data to inform the operator approximation.) There are a couple of details that I would like to see to help improve reproducibility. In terms of related work, there are a couple of more tangential directions that come to mind where connections could potentially be made / that may be interesting for the authors to consider. There may be connections related to initial stages of gradient descent identifying subspaces in which most of the parameter evolution will occur (i.e. containing the lottery ticket weights).


Review for NeurIPS paper: Optimizing Neural Networks via Koopman Operator Theory

Neural Information Processing Systems

This paper provides a new perspective on neural network training based on Koopman operator theory (KOT). The paper received mixed reviews (top 50% - marginally above, reject, marginally above - accept, marginally below). On the positive side, and despite KOT being very old, the new perspective has a lot of potential: since the KOT is linear, if one can find (or approximate) eigenfunctions, one could compute and analyze training dynamics more easily and make optimization more efficient. On the negative side, the paper is a first step, and needs further development and experimental evaluation to demonstrate the value. Some reviewers also expressed the paper lacks clarity.


Optimizing Neural Networks via Koopman Operator Theory

Neural Information Processing Systems

Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is 10x faster than various gradient descent based methods (e.g.


Optimizing Neural Networks

#artificialintelligence

The goal of training an artificial neural network is to achieve the lowest generalized error in the least amount of time. In this article I'll outline a brief description of some common methods of optimizing training. Feature scaling, is the process of scaling the input features such that all features occupy the same range of values. This ensures that the gradient of the cost function is not exaggerated in any particular dimension, which reduces oscillation during gradient descent. Oscillation during gradient descent means the training is not maximally efficient, as it's not taking the shortest path to the minimum of the cost function.


Optimizing Neural Network for Computer Vision task in Edge Device

S, Ranjith M, Parameshwara, S, A, Pavan Yadav, Hegde, Shriganesh

arXiv.org Artificial Intelligence

With the rise of Artificial intelligence, it is realized that deep-learning-based approaches give satisfactory results compared to numerous other state-of-the-art schemes which are hand-engineered to all the computer vision tasks. The features learned from Convolutional neural networks outperform other hand-engineered feature-based methods like SIFT [Mikolajczyk and Schmid (2004)] and HoG [Dalal and Triggs (2005)] in computer vision tasks like image classification and object detection. The availability of large datasets and powerful computation devices made it possible to train the large and complex neural networks to obtain the desired performance on many computer vision tasks. The large amount of open-source pre-trained models trained on large datasets like ImageNet [Challenge], MS-COCO [Lin et al. (2014)], SHVN [Netzer et al. (2011)] created a large number of useful filters especially the features learned from the initial layer helps in transfer learning a lot. In transfer learning, most of the time only the last few layers of pre-trained models are modified and trained which counters the problem of having fewer data to a certain extent. Currently, a variety of embedded systems are deployed but the usage of neural networks is limited in edge devices like microcontrollers, Raspberry Pi. Household devices like Refrigerators, washing machines use a set of logic, rules for their automatic operations. By optimizing the network trained on a dataset traditional way of controlling can be replaced by intelligently monitoring the systems with the power of AI and neural networks [Ranjith M S and Parameshwara (2020)]. Optimizing the convolutional neural network architectures like ResNet [ He et al. (2016)], DenseNet [ Huang et al. (2017)], AlexNet [ Krizhevsky et al. (2012)] which are generally used in computer vision tasks allows creating many useful applications.


Optimizing Neural Networks -- Where to Start?

#artificialintelligence

We'll use Google Colab for this project, so most of the libraries are already installed. Since we'll train neural networks, it's important to use GPU to speed up training. To enable GPU, just go to "Runtime" in the dropdown menu and select "Change runtime type". You can then verify by hovering mouse over "CONNECTED" in the top right corner: Although we can download the dataset manually, for reproducibility, let's download it from Kaggle. Since we need to do it using Kaggle's API, we'll first create the API token by visiting the "My Account" page on Kaggle.


Optimizing Neural Networks in the Equivalent Class Space

Meng, Qi, Chen, Wei, Zheng, Shuxin, Ye, Qiwei, Liu, Tie-Yan

arXiv.org Machine Learning

It has been widely observed that many activation functions and pooling methods of neural network models have (positive-) rescaling-invariant property, including ReLU, PReLU, max-pooling, and average pooling, which makes fully-connected neural networks (FNNs) and convolutional neural networks (CNNs) invariant to (positive) rescaling operation across layers. This may cause unneglectable problems with their optimization: (1) different NN models could be equivalent, but their gradients can be very different from each other; (2) it can be proven that the loss functions may have many spurious critical points in the redundant weight space. To tackle these problems, in this paper, we first characterize the rescaling-invariant properties of NN models using equivalent classes and prove that the dimension of the equivalent class space is significantly smaller than the dimension of the original weight space. Then we represent the loss function in the compact equivalent class space and develop novel algorithms that conduct optimization of the NN models directly in the equivalent class space. We call these algorithms Equivalent Class Optimization (abbreviated as EC-Opt) algorithms. Moreover, we design efficient tricks to compute the gradients in the equivalent class, which almost have no extra computational complexity as compared to standard back-propagation (BP). We conducted experimental study to demonstrate the effectiveness of our proposed new optimization algorithms. In particular, we show that by using the idea of EC-Opt, we can significantly improve the accuracy of the learned model (for both FNN and CNN), as compared to using conventional stochastic gradient descent algorithms.